Characterization of somatic structural variations in 528 Chinese individuals with Esophageal squamous cell carcinoma

Esophageal squamous cell carcinoma (ESCC) demonstrates high genome instability. Here, we analyze 528 whole genomes to investigate structural variations’ mechanisms and biological functions. SVs show multi-mode distributions in size, indicating distinct mutational processes. We develop a tool and define five types of complex rearrangements with templated insertions. We highlight a type of fold-back inversion, which is associated with poor outcomes. Distinct rearrangement signatures demonstrate variable genomic metrics such as replicating time, spatial proximity, and chromatin accessibility. Specifically, fold-back inversion tends to occur near the centrosome; TD-c2 (Tandem duplication-cluster2) is significantly enriched in chromatin-accessibility and early-replication region compared to other signatures. Analyses of TD-c2 signature reveal 9 TD hotspots, of which we identify a hotspot consisting of a super-enhancer of PTHLH. We confirm the oncogenic effect of the PTHLH gene and its interaction with enhancers through functional experiments. Finally, extrachromosomal circular DNAs (ecDNAs) are present in 14% of ESCCs and have strong selective advantages to driver genes.


Statistics
For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of of all covariates tested A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g. means) or or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g. Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in in published literature, software must be be made available to to editors and reviewers. We We strongly encourage code deposition in in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of of data All manuscripts must include a data availability statement This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or or web links for publicly available datasets -A description of of any restrictions on on data availability -For clinical datasets or or third party data, please ensure that the statement adheres to to our policy ) and HRA002508 (WGS & Nanopore, https://ngdc.cncb.ac.cn/gsa-human/browse/HRA002508)). The raw sequencing data are available under controlled access due to data privacy laws related to patient consent for data sharing and the data should be used for research purposes only. Access can be obtained by approval via their respective DAC (Data Access Committees) in the GSA-human database. According to the guidelines of GSA-human, all non-profit researchers are allowed access to the data and the Principle Investigator of any research group is allowed to apply for Controlled-access of the data. For data requests, please refer to the detailed guide: https://ngdc.cncb.ac.cn/gsa-human/document/GSA-Human_Request_Guide_for_Users_us.pdf. DAC will respond within two weeks. The data will be available within a week once the access has been granted and they will be available to download for one year. The human genome database used in this paper is version hg19 (https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/ We performed whole-genome sequencing on regional tumor samples and adjacent normal tissues from 528 ESCC patients，of which 133 pairs also were sequenced by RNA-seq. In addition, We supplemented the WGS and Nanopore sequencing of two ESCC samples. The ESCC regional tumor samples produced low quality sequencing data were excluded.
For all experiments, at least three independent experiments were performed and each experiment was performed in triplicate. All results of duplicates were consistent.
The ESCC patients were collected randomly to form the cohort. All the studies ESCC patients were retrospective and did not need randomly grouping. And cells were allocated into experimental groups randomly.
Investigators were blinded to the group allocation during cell implantation or sample/data collection.
Antibodies used IHC were validated using negative and positive controls and underwent an optimization process including titration, variation of antigen retrieval process, and incubation periods, and further testing in a select set of clinical samples before they were applied to a large scale cohorts.
ESCC cell lines KYSE180, KYSE150 and KYSE450 cell line were purchased from Cell Bank of Type Culture Collection of Chinese